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(57) ABSTRACT 

A method for digital watermarking and, in particular, for 
digital data hiding of significant amounts of data in images 
and video. The method employs a discrete wavelet transform 
for embedding gray scale images which can be as great as 
25% of the host image data. A simple control parameter is 
used that can be tailored to either hiding or watermarking 
purposes, and is robust to operations such as JPEG com- 
pression. The method also uses noise-resilient channel codes 
based on multidimensional lattices which can provide for 
embedding signature data such as gray-scale or color 
images. Furthermore, embedded image data can be recov- 
ered in the absence of the original host image by inserting 
the data into the host image in the DCT domain by encoding 
the signature DCT coefficients using a lattice coding scheme 
before embedding, checking each block of host DCT coef- 
ficients for its texture content, and appropriately inserting 
the signatured codes depending on a local texture measure. 
The method further provides for source coding the signature 
data by vector quantization, where the indices are embedded 
in the host by perturbing it using orthogonal transform 
domain vector perturbations. The transform coefficients of 
the parent data are grouped into vectors, and the vectors are 
perturbed using noise-resilient channel codes derived from 
multidimensional lattices. The perturbations are constrained 
by a maximum allowable mean-squared error that can be 
introduced in the host. Also, speech can be hidden in video 
by wavelet transforming the host video frame by frame, and 
perturbing vectors of coefficients using lattice channel codes 
to represent hidden vector quantized speech. The embedded 
video is subjected to H.263 compression before retrieving 
the hidden speech. 
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METHOD FOR EMBEDDING AND authenticate the data. In this paper we are mainly interested 

EXTRACTING DIGITAL DATA IN IMAGES in embedding data such that the signature is invisible in the 

AND VIDEO host image. The challenge is to simultaneously ensure that 

the watermarked image be perceptually indistinguishable 
CROSS-REFERENCE TO RELATED 5 from the original, and that the signature be recoverable even 

APPLICATIONS when the watermarked image has been compressed or 

... . . . c Tyc -*i transformed by standard image processing operations. 

This application claims priority from U.S. provisional 7 . . . . , ■ • 

application Sen No. 60/071,581 filed on Jan. 15, 1998. Research on digital watermarking can be categorized into 

two broad classes depending on the data embedding domain. 

STATEMENT REGARDING FEDERALLY 10 One such class is based on embedding data in the spatial 

SPONSORED RESEARCH OR DEVELOPMENT domain, while the other is based on injection in the fre- 
quency or transform domain. Most of the recent research on 

This invention was made with Government support under watermarking emphasizes the transform domain approach. 

Grant Nos. 94-1120 and 97-04785 awarded by the National Targete d applications include watermarking for copyright 

Science Foundation, Grant No. NAGW 3951 awarded by the 15 protection or authentication. Typically, the data used to 

National Aeronautics and Space Administration, and Grant represent the digital watermarks are a very small fraction of 

No. N00014-95-1-1214 awarded by the Office of Naval ±t host image data< Such signatures include, for example, 

Research. The Government has certain rights in this inven- pseudo-random numbers, trademark symbols and binary 

U0D - images. Spatial domain methods usually modify the least- 

RFFFRFNCE TO A MICROFICHE APPENDIX 2 ° bits of the host ™ d arc ' in S cnera1 ' not 

REFERENCE 1U A MICKUMCHb ArroNUiX fobust tQ operations ^ i ow . pass filtering. Much work 

Not Applicable has also been done in modifying the data in the transform 

domain. These include DCT domain techniques and wavelet 
NOTICE OF MATERIAL SUBJECT TO transforms. 

COPYRIGHT PROTECTION 25 while most of the contemporary research on watermark- 

All of the material in this patent document is subject to m 6 concentrates on copyright protection in internet data 
copyright protection under the copyright laws of the United distribution, a different kind of watermarking, commonly 
States and of other countries. The owner of the copyright as data hldin g> * at P resent receiving considerable 

rights has no objection to the facsimile reproduction by , 0 attention. Data hiding is a generalization of watermarking 
anyone of the patent document or the patent disclosure, as it wherein perceptually invisible changes are made to the 
appears in the United States Patent and Trademark Office file ma g e P ixels for embedding additional information in the 
or records, but otherwise reserves all copyright rights what- data - Data hidin S * intended to hide larger amounts of data 
soever into a host source, rather than just to check for authenticity 

and copyright information. In fact, the problem of water- 
BACKGROUND OF THE INVENTION 35 marking or copyright protection is a special case of the 

generic problem of data hiding, where a small signature is 

1. Field of the Invention embedded with greater robustness to noise. 

This invention pertains generally to encoding and decod- DaU ming provides a mechanism for embedding control, 

ing data, and more particularly to a method for embedding descri p tive> or reference information in a given signal. For 

data in still images and video frames. example, this information can be used for tracking the use of 

2. Description of the Background Art a particular video clip, e.g., for pay-per-use applications, 
As multimedia data becomes widespread, such as on the including billing for commercials and video and audio 

internet, there is a need to address issues related to the broadcast. Data hiding could be quite challenging if one 

security and protection of such data, as well as to ensure ^ considers embedding one image in another image. 

copyright protection. Most multimedia data sources are Thctc has also been work on data hiding in color images. J- ^ 

readily accessible to, and downloadable by, all users of the Qnce method is to use an amplitude modulation scheme ^ O ' 

internet. While access restriction can be provided using wherein signature bits are multiply embedded by modifying & ^ Q ^ 

electronic keys, they do not offer protection against further values in the blue channel. The blue channel is chosen ^ 0 ^ 

(illegal) distribution of such data. 5Q as me human visual system is less sensitive to blue than ^ v 

Digital watermarking is one approach to managing this other primary colors. Also, changes in regions of high 

problem by encoding user or other copyright information frequencies and high luminance are less perceptible, and 

directly in the data. The purpose of digital watermarking is thus are favorable locations for data embedding. Robustness 

not to restrict use of multimedia resources, but to resist is achieved by embedding the signature several times at 

attack from unauthorized users. 55 many different locations in the image. Another approach is 

While watermarking of image data could be visible, such use the S-CIELAB, a well-known standard for measuring 
as a background transparent signature, a visible watermark color reproduction errors. In that approach, amplitude- 
may not be acceptable to users in some contexts. Therefore, modulated sinusoidal signals are embedded into the yellow- 
it is preferable to digitally watermark and image by invisibly blue color band of an opponent-color representation scheme, 
hiding a signature information into the host image. The go It will also be appreciated that, in perceptual data hiding, 
signature is then recovered using an appropriate decoding one is interested in embedding and recovering high quality 
process. multimedia data, such as images, video and audio. The host 

In order to be effective, an invisible watermark should be multimedia data itself could be subject to signal processing 

secure, reliable, and resistant to common signal processing operations, typically compression. Depending on the end 

operations and intentional attacks. Recovering the signature 65 user application, both lossy and lossless data embedding is 

from the watermarked media could be used to identify the of interest. Like in digital watermarking, two scenarios are 

rightful owners and the intended recipients as well as to possible. One is that the original host into which the data is 
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embedded is available. Alternatively, the original host infor- 
mation may not available. This is a much more difficult 
problem. 

Data hiding can also be used for transmitting different 
kinds of information securely over an existing channel 
dedicated for transmitting something else, such as transmit- 
ting hidden speech over a channel meant for transmitting 
H.263 video, as in this work. Since a substantial amount has 
already been invested in the development of the software 
and hardware infrastructure for standard-based data 
transmission, it makes monetary sense to try to use the same 
for transmission of secure or non-standard data. 

BRIEF SUMMARY OF THE INVENTION 

In general terms, the present invention pertains to a data 
embedding scheme that is suitable for both watermarking 
and image data hiding. While watermarking requires robust- 
ness to image manipulation, data hiding requires that there 
is very little visible distortion in the host image. While much 
of the previous work used signature data that is a small 
fraction of the host image data, the present invention can 
easily handle gray-scale images that could be as much as 
25% of the host image. 

In accordance with one aspect of the invention, in recov- 
ering the signature image, it is assumed that the original host 
image is available. The invention distributes the signature 
information in the discrete wavelet transform (DWT) 
domain of the host image. Spatial distribution of the DWT 
coefficients helps to recover the signature even when the 
images are compressed using JPEG lossy compression. In 
some of the recent work on using wavelets for digital 
watermarking, the signatures were encoded in all DWT 
bands. Such an embedding is sensitive to operations that 
change the high frequency content without degrading the 
image quality significantly. Examples include low pass 
filtering for image enhancement and JPEG lossy compres- 
sion. In contrast, the present invention focuses on hiding the 
signature mostly in the low frequency DWT bands, and 
stable reconstruction can be obtained even when the images 
are transformed, quantized (as in JPEG), or otherwise modi- 
fied by enhancement or low pass filtering operations. 

In accordance with another aspect of the invention, it is 
also assumed that the host image is available. The invention 
provides a robust data hiding technique using channel codes 
derived from a finite subset of general n-dimensional lat- 
tices. In particular we use the lattice, which consists of all 
integer n-tuples with an even sum. As the quantity of 
embedded data increases, higher order shells of the lattice 
are included in the channel code to accommodate them. 
Using this approach, a gray-scale image of as much as half 
the size of the host image can be embedded by perturbing the 
host wavelet coefficients. 

The embedding and extracting of the digital watermarking 
system are similar to the encoder and decoder of the digital 
communication system. Similar to the communication chan- 
nel noise, the watermarked image might undergo undesir- 
able transformations: for example, intentional manipulations 
to remove or degrade the quality of the watermarking: or 
typical signal processing operations such as compression 
that may affect the watermark. We use a wavelet-based 
compression scheme, and the JPEG compression scheme for 
the manipulation of the watermarked image before attempt- 
ing retrieval. Our experimental results indicate that there are 
no visible distortions in the watermarked image, and the 
recovered signature is similar to the original signature even 
after 75% wavelet compression and 85% JPEG lossy com- 
pression. 
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In accordance with a further aspect of the invention, color 
signature images are fused in larger color images using 
wavelet transforms and lattice structures. We use the YUV 
color space for representing color. The Y component is the 

5 luminance part of the signal, and U and V represent the 
chrominance components. Adopting the YUV color space 
facilitates a simple extension from images to digital video 
such as those in the MPEG format. The U, V components are 
down-sampled by a factor of two. In this method, the host 

10 and signature images are first wavelet transformed used the 
discrete Haar wavelet transform. The wavelet coefficients 
are then encoded using channel codes derived from a finite 
subset of the lattice structure, which consists of all integer 
N-tuples with constraints. As the quantity of embedded data 

15 increases, higher order shells of the lattice structure are 
included in the channel code to accommodate them. 

In accordance with a further aspect of the invention, a 
spatial domain embedding method for data hiding speech 
and video in compressed video is presented based on bit 

20 replacement. Spatial domain strategies are quite sensitive to 
transformations on the embedded signal. Compared to con- 
ventional techniques, the invention can embed significantly 
larger amount of signature data into the host — up to 25% of 
the host data, with little or no perceptual distortion. 

25 An object of the invention is to embed a significant 
amount of data in images and/or video. 

Another object of the invention is to provide for including 
quality control in data transmission (e.g., self-enhancing 
images), embedding control information in audio/visual bit 

30 streams, in addition to watermarking. 

Further objects and advantages of the invention will be 
brought out in the following portions of the specification, 
wherein the detailed description is for the purpose of fully 
disclosing preferred embodiments of the invention without 
placing limitations thereon. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention will be more fully understood by reference 
to the following drawings which are for illustrative purposes 
40 only: 

FIG. 1 is a block diagram of a method for embedding 
gray -scale images using a discrete wavelet transform 
according to the invention, where the signature image is 
assumed to be one quarter the size of the host image, and 

45 where there is shown an expansion of a single signature 
coefficient to a 2x2 block of coefficients for embedding in 
the host image. 

FIG. 2 is a graph showing the presence of a signature in 
a lossy compressed image where the host is a Lena image 

50 and the signature is a tiger image. 

FIG. 3 is a graph showing the presence of a signature in 
a lossy compressed image where the host is a cityscape 
image and the signature is an airplane image. 

55 FIG. 4 is a diagram showing possible p-ary perturbations 
of a host vector where all points are shown in n-dimensional 
space. 

FIG. 5 is a diagram showing possible noisy vector posi- 
tions of an original perturbed vector s ; after transformation 
60 where all points are shown in n-dimensional space. 

FIG. 6 is a block diagram showing a data embedding and 
extraction method using multidimensional lattices according 
to the present invention. 

FIG. 7 is a block diagram of the encoder block shown in 
65 FIG, 6 for encoding gray scale images. 

FIG. 8 is a block diagram of the decoder block shown in 
FIG. 6. 
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FIG. 9 is a diagram showing the decision boundary within 
each of a plurality of shell perturbed lattice points. 

FIG. 10 is a graph showing the presence of a hat-girl 
signature in JPEG lossy compressed images for a=10 t p=32; 
a=15, p=32; a-lO, 0=144; and a«15, p-144. 

FIG. 11 is a block diagram of an alternative embodiment 
of the encoder shown in FIG. 7 for embedding color images. 

FIG. 12 is a diagram showing determination of the closest 
vector from the observed vector within each of a plurality of 
shell perturbed lattice points. 

FIG. 13 is a graph showing similarity results for color data 
embedding. 

FIG. 14 is a graph showing PSNR results for color data 
embedding. 

FIG. 15 is a block diagram of a method for data embed- 
ding for reconstruction without the host image according to 
the present invention where data is embedded in the block 
DCT domain, signature DCT coefficients are quantized, 
coded using lattice codes, and adaptively embedded into the 
host DCT coefficients using texture masking. 

FIG. 16 is a diagram showing a sample signature quan- 
tization matrix for an 8x8 DCT coefficient block, requiring 
112 host image coefficients to encode, 

FIG. 17 is a diagram showing partitioning of the DCT 
block of FIG. 16 for signal insertion (shaded regions) where 
18 coefficients are used in each block. 

FIG. 18 is a diagram showing a sample signature quan- 
tization matrix requiring 192 host coefficients. 

FIG. 19 is a diagram showing partitioning of the DCT 
block of FIG. 18 where the host coefficients are distributed 
over 16 blocks, 12 coefficients per block, as shown by the 
shaded regions. 

FIG. 20 is a block diagram of the encoder block shown in 
FIG. 15. 

FIG. 21 is a graph showing the PSNR of embedded and 
recovered host images as a function of JPEG compression 
ratio with a scale factor of 5, wherein the solid lines 
represent 6% embedding using the quantization matrices of 
FIG. 18 and FIG. 19, and wherein the dashed lines shown the 
results at 25% embedding using the quantization matrices of 
FIG. 16 and FIG. 17. 

FIG. 22 is a graph showing the PSNR of the recovered 
signature image for the images of FIG. 21 as a function of 
JPEG compression ratio with a scale factor of 5, wherein the 
solid lines represent 6% embedding using the quantization 
matrices of FIG. 18 and FIG. 19, and wherein the dashed 
lines shown the results at 25% embedding using the quan- 
tization matrices of FIG. 16 and FIG. 17. 

FIG. 23 is a schematic showing the data hiding and 
watermarking problem. 

FIG. 24 is a diagram showing the principle of data 
embedding in relation to FIG. 23. 

FIG. 25 is a diagram showing the principle of data 
extraction in relation to FIG. 24, 

FIG. 26 is a diagram showing a two stage wavelet 
decomposition of each frame for recovery from a video host 
without the original video, where the data is hidden in the 
shaded LL-HH subband after zeroing. 

FIG. 27 is a schematic showing a method for data 
encoding in video according to the resent invention using the 
zeroed LL-HH subband. 

FIG. 28 is a schematic showing a method for data 
decoding in video according to the present invention using 
the zeroed LL-HH subband. 
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FIG. 29 is a graph showing the SNR of extracted hidden 
male speech vs. bit rate for an H.263 compressed "News" bit 
stream at 15 frames/s for D 4 , E 8 and A 16 lattice implemen- 
tations of the data hiding and recovery method depicted in 
5 FIG. 27 and FIG. 28. 

FIG. 30 is a graph showing the SNR of extracted hidden 
female speech vs. bit rate for an H.263 compressed "grand- 
mother" bit stream at 7.5 frames/s for E 6 , K 12 and G24 lattice 
implementations of the data hiding and recovery method 
10 depicted in FIG. 27 and FIG. 28. 

DETAILED DESCRIPTION OF THE 
INVENTION 

15 Referring more specifically to the drawings, for illustra- 
tive purposes the present invention is described with refer- 
ence to FIG. 1 through FIG. 30. It will be appreciated that 
the invention may vary as to configuration and methodology 
without departing from the basic concepts as disclosed 

20 herein. 

1. Data Embedding 

A watermark should be robust to typical image processing 
operations, including lossy compression. Compression 
techniques, such as JPEG, typically affect the high fre- 

25 quency components. This is also true with most perceptual 
coding techniques. For these reasons, a digital signature 
should be placed in perceptually salient regions in the data. 
For techniques based on frequency domain modifications, 
this implies embedding the signature in mostly low fre- 

30 quency components. Inserting a signature in low frequency 
components creates problems if one is interested in invisible 
watermarks. This is particularly true in data hiding applica- 
tions where the data to be hidden could be a significant 
percentage of the original data. 

35 To address this problem, the present invention uses a 
wavelet transform to embed signature information in differ- 
ent frequency bands. For experimental purposes we used the 
discrete Haar wavelet basis; however, those skilled in the art 
will appreciate that extending the invention to another 

40 wavelet basis is reasonably straightforward. Both the signa- 
ture data, which in our case is another image, and the host 
image data, are decomposed using the discrete Haar wavelet 
transform (DHWT). 

In the following discussion it is assumed that the signature 

45 image is one quarter the size of the host image, and both 
images are gray scale, one byte per pixel. Embedding occurs 
in the wavelet transform domain as the wavelet coefficients 
are combined to create a watermarked image. It is assumed 
that the host image is available for signature image recovery. 

50 A schematic of this approach is shown in FIG. 1. 

The basic steps in embedding the signature coefficients 
into the host image coefficients are: 

1. Decompose by one level the host and signature images 
using the DHWT. This results in four bands, which are 

55 usually referred to as the LL, LH, HL, and the HH bands as 
shown in block 10. 

2. Each signature image coefficient is expanded into a 2x2 
block as follows: 

(a) Each coefficient value is linearly scaled to a 24 bit 
60 representation as shown in block 12. 

(b) Let A, B, C represent, respectively, the most signifi- 
cant byte, the middle byte, and the least significant byte 
in a 24 bit representation. Three 24-bit numbers, A', B\ 
C, are generated with their most significant bytes set to 

65 A, B, and C, respectively, and with their two least 
significant bytes set to zero as shown in block 14. Then 
a 2x2 expanded block is formed as shown in block 16. 
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3. The host image coefficients are also linearly scaled In checking for the presence of a signature, the quality of 
within each band to a 24 bit representation. The minimum the reconstruction of signature itself is not an issue. A binary 
and maximum values in each band will be used in the decision for the presence or absence of a signature need to 
inverse transformation described below. be made. We used a measure of "similarity" S to compute 

4. The scaled host image coefficients are now added to the 5 the cross correlation between the recovered signature s*(m, 
expanded signature transform to form a new fused trans- n) and the original signature s(m, n) in the wavelet transform 
form. Let h(m, n) be the (m, n) th wavelet coefficient of the domain. This similarity is defined as: 
host image, and let s(m, n) be the (m, n/* signature 

coefficient after forming the expanded blocks as described in y n ^ m ^ n ^ (2) 

Step 2. Note that after expansion, each of the bands in the 10 _w 

signature wavelet transform is of the same dimension as the s= £ (rim, n)) 2 

host image bands. The fused (m, n)** coefficient is then m -" 
computed as: 

, . , , . , . Note that the similarity computed as above does not guar- 

v J K } K ' w 15 antee that the maximum value is 1.0. Graphs of this stmi- 

where the scale factor a determines the relative percentage larity for varying JPEG compression and for different scale 

of the host and signature image components in the new factors for two different examples are shown in FIG. 2 and 

image. FIG. 3. In both graphs, the scale factors were a»5, a=7, a=9 

5. The fused transform coefficients in each band are scaled and a»ll . As can be seen from those graphs, it is easy to find 
back to the levels of the host image transform coefficients 20 a threshold for signature detection between unwatermarked 
using the minimum and maximum coefficient values in Step and watermarked images. 

3. The foregoing method can be used for both digital water- 

6. An inverse transform is now computed to give the marking related applications as well as for data hiding 
watermarked image. purposes. The scale factor in Equation (1) controls the 

EXAMPLE 1 25 re l auve amount of host and signature image data in the 

„. 1 « 1 ... „- n , embedded image. A larger scale factor can be used for data 

We present here results of embedding 128x128 gray scale mia where it ^ to mainlain , he x 

(one byte per pixel) signature images in a 256x256 Lena u Qf ^ em5edded ^ 

. A lower scale factor is better 

image. T\vo images, one a hat girl picture and a p.cture of M fof watermarkin wh f re robustness to typical im 

a tiger, were used as signature images in the foUowing 3Q si operations is needed. Experimental results dem- 

CXp T meDtS - ScalE faCt0rS ° f 0=5 ' a ° 7, aDd a=U WCre onstrate that good quality signature recovery and authenti- 

. / f ^ use • cation is possible when the images are quantized and JPEG 

fislM " I We DOted that the hi S her the scale factor ' me better the compressed by as much as 90%. 

V\ j quality of the embedded image (i.e., less distortion due to It ^ be a pp rec iated that, even though the Haar wavelet 

embedding). Even if the signature image has much texture 35 basis was used ^ the experimentS) the method can 5e easily 

information like a tiger picture, the embedded image cannot adapled , 0 other wavdet transforms md for more than one 

be visually distinguished from the original host image: levd of decomposi tion. It might be worth exploring the use 

Two sets of experiments were conducted. In the first, for 0 f ot her basis functions depending on the characteristics of 

data hiding applications, results of signature image recon- the nost and signature images. In some cases, particularly 

struction from JPEG lossy compressed images for varying 40 wnen the host image background lacks texture whereas the 

scale factors were determined. In the second, for watermark- signature image has lot of texture, one can see a noisy 

ing applications, we determined the results of signature background in the embedded image, 

detection from these lossy compressed images. i n digital watermarking, the signatures are usually of 

For data hiding purposes it is reasonable to choose a larger mU ch smaller dimensions (in terms of number of bytes 

scale factor in Equation (1) because we are not too con- 45 needed) compared to the host image. Since the method 

cerned about degradation due to image processing opera- described above can manage a significantly larger number of 

tions. In hiding one image in another, it is more important to signature data, it is possible to distribute the signature 

ensure that the quality of the watermarked image is as close spatially as well, thus making watermarking robust to opera- 

to the original as possible, with very little visual distortion. tions such as image cropping. 

Almost perfect reconstruction is possible when there is no so 2. Multidimensional Lattice Channel Code 

further image processing of the watermarked images. 2.1 Methodology 

On the other hand, for copyright protection and authen- If the original host image is available, the operations of 

tication purposes it is important that the watermarked data injection and retrieval are, in fact, very similar to the 

images are robust to typical image processing operations. In channel coding and decoding operations in a typical digital 

such cases it is reasonable to assume that the signatures 55 communication system. Channel coding refers to the gamut 

consume significantly fewer bytes than the host image and of signal processing done before transmission of data over a 

as such can be spatially distributed. In our experiments we noisy channel. In watermarking in the transform domain, the 

used lossy JPEG compression where the signatures are the original host data is transformed, and the transformed coef- 

scale images, and it is reasonable to expect that one can ficients are perturbed by a small amount in one of several 

obtain much better results if the signatures are binary images 60 possible ways in order to represent the signature data. When 

of much lower dimensions. Lower values for the scale factor the watermarked image is compressed or modified by other 

in Equation (1) should be used when it is likely that the image processing operations, noise is added to the already 

images undergo significant distortion. We recovered signa- perturbed coefficients. The retrieval operation subtracts the 

tures for JPEG compression of 93% for scale factors of a- 3 received coefficients from the original ones to obtain the 

to a- 11. As expected, images embedded with larger scale 65 noisy perturbations. The true perturbations that represent the 

factor resulted in poor reconstruction for the same compres- injected data are then estimated from the noisy data as best 

sion factor. as possible. 
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In the present invention, we have adopted a vector-based 
approach to hidden data injection. We group N transform 
coefficients to form an N-dimensional vector, and modify it 
by codes that represent the data to be embedded. The 
motivation for using vector perturbations as opposed to 
scalar perturbations follows from the realization that higher 
dimensional constellations usually result in lower probabil- 
ity of error for the same rate of data injection and the same 
noise statistics. 

FIG. 4 and FIG. 5 show the basic concept of the pertur- 
bation vector in the host N-dimensional vector space. In both 
figures, "x" represents a host vector in an N-dimensional 
space. To embed data from an p-ary source with symbols 
{s 2 , Sj, . . . , s p }, we perturb the original vector so that the 
perturbation coincides with one of p corresponding channel 
codes. The perturbed vector is denoted by one of the "o"s in 
the figures, depending on the particular source symbol it 
represents. After the watermarked image has undergone 
compression or other transformations, a perturbed vector 
representing, for example symbol s ( in the diagram, may be 
received as a noisy vector in FIG. 5. It is then an 
estimation problem to extract the transmitted symbol from 
the vector received. Assuming an additive Gaussian noise 
model, the received vector is decoded as representing the 
symbol whose channel code it is closest to in Euclidean 
distance. 

Codes derived as subsets of multidimensional lattices 
have been shown to be very efficient for channel coding. In 
the following, we describe the general concept of lattices, 
and in particular, the D 4 lattice that was used in our data 
embedding algorithm. 

2.2 Lattice Structures 

The Voronoi regions of various n-dimensional lattices can 
be used to construct n-dimensional quantizer cells for uni- 
formly distributed inputs. It is known that some of these 
lattices produce very good channel codes, and yield high 
values of nominal coding gain. That is, for the same power 
constraint on the channel, the channel codes are maximally 
separated from each other so that they are most robust to 
noise. The lattices considered here are the root lattices and 
their duals, namely A„, A*„, D„, D*„, E 6 , E s , etc. If a^ . . . , 
a K are n linearly independent vectors in an m-dimensional 
Euclidean space with m^n, the set of all vectors 

*=w 1 a 1 + . . . +u„a m (3) 

where u a , . . . , u„ are arbitrary integers, constitute an 
n-dimensional root lattice A„. Further, if A is a lattice in 9t M , 
the dual lattice A* consists of all points x in the span of A 
such that x-yez for all yeA. Some common lattices and 
definitions are presented below. 

For n^l, A„ is the n-dimensional lattice consisting of the 
points (x 0 , x 1( . . . , xj in Z" +1 with 2x^=0. 

For n^2, D„ consists of the points (xj, x 2 , , . . , x„) in Z" 
with Ix; even. In other words, if we color the integer lattice 
points alternately red and blue in a checkerboard coloring, 
D„ consists of the red points. In 4 dimensions, the D 4 lattice 
is known to yield the best coding gain. 

The E 6 , E a and A 16 lattices give very good channel coding 
gains in 6, 8, and 16 dimensions respectively. The E 8 lattice 
is derived from the D a lattice, and is defined as the union of 
D 8 and the cosset 

In other words, E 8 consists of the points (x ls . . . , x 8 ) with 
X; eZ and Zx,. even, together with the points (y lf . . . , y 8 ) with 
y,- cZ+^ and 2y,- even. E d is a subspace of dimension 6 in E 8 , 
consisting of the points (uq, u a , . . . , u„) with u 6 -u 7 — u 8 . 
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For a n-dimensional lattice A, the Voronoi region around 
any lattice point is the set of points in 9t M closest to the 
lattice point. Therefore, the Voronoi region V(0) around the 
origin is given as: 

5 V(0)-{jr£9fl" | |(x[| ^ (Jr-M ||(for all nonzero ma)} (4) 

2.3 Description of the D 4 Lattice 

It is known that some lattices produce very good spherical 
codes for channel coding. That is, for the same constraint on 
10 deviation from the true coefficient values, the channel codes 
are maximally separated from each other so that they are 
most robust to noise. 

In general, the D 4 root lattice produces the best channel 
code in 4 dimensions. It is known that for small noise, this 
15 lattice gives a nominal channel coding gain of 1,414 over 
binary encoding. As mentioned earlier, the lattice consists of 
the points (x Jf . . . , x 4 ) having integer coordinates with an 
even sum. 

As in all lattices, the lattice points of the D 4 lattice fall on 
20 concentric shells of increasing distance from the all zero 
vector. For example, the 24 lattice points given by all 
permutations of (±1,±1, 0, 0) lie on the first shell of the 
lattice at a distance from the center. The second shell at 
distance ^2 from the center contains 24 lattice points again, 
25 8 of which are of type (±2 , 0, 0, 0), and 16 are of type (±1, 
±1, *1, ±1). Table 1 shows the shell number, the squared 
norm, the lattice point types, and the number of lattice points 
for the first few shells of the D 4 lattice. The superscript "p" 
after the points in the table denote "all permutations of* the 
30 elements constituting it. By choosing appropriate subsets of 
points from the lattice the rate for data embedding can be 
varied. 

3. Data Hiding in Images 
3.1 Embedding Procedure 

35 It is well known that embedding in the low-frequency 
bands is more robust to manipulations such as enhancement 
and image compression. However, changes made to the low 
frequency components may result in visible artifacts. Modi- 
fying the data in a multiresolution framework, such as a 

40 wavelet transform, will provide good quality embedding 
with little perceptual distortion. 

The schematic diagram 20 in FIG. 6 shows our water- 
marking procedure using multidimensional lattice channel 
codes. The coefficient vectors perturbed in our implemen- 

45 tations are of dimension 4, and the channel code used to 
embed the data is a subset of the D 4 lattice. As the quantity 
of embedded data increases, higher order shells of the 
embedding lattice are included in the channel code to 
accommodate them. In this algorithm, a gray-scale image of 

50 as much as half the size of the host image is hidden by vector 
based perturbations. 

A single level of the discrete wavelet transform (DWT) 
decomposition of both the host and the signature image is 
made before data embedding. A detailed diagram of the 

55 encoder block 22 from FIG. 6 is shown in FIG. 7. Each 
coefficient of the signature image is quantized into p levels. 
In order to embed the quantized coefficient information, a set 
of n coefficients (n-4 in the case of D 4 lattice embedding) in 
the host image is grouped to form an n-dimensional vector, 

60 and the vector is then perturbed according to a p-ary channel 
code consisting of a subset of an n-dimensional lattice scaled 
by a factor a. If v represents a vector of host DWT 
coefficients after grouping, and the index of the quantized 
signature coefficient is i, then the perturbed vector is given 

65 by: 
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where ~C(s?) represents the channel code (subset of the 
n-dimensional lattice) corresponding to the symbol s if where 
.... P. 

Each subband of the signature image is embedded into the 
corresponding subband of the host. That is, each coefficient 
in the LL band of the signature image is hidden in four 
coefficients in the LL band of the host, and so on. The scale 
factor chosen for embedding in the higher bands is less than 
the scale factor chosen for the LL band, by some constant 
factors. However, we will refer to the scale factor chosen for 
the LL band as a. 

Various subsets of the 4<limensional D 4 lattice chosen for 
various values of source quantization levels p, that were 
used in the experiments, are shown in Table 2. A high value 
of p quantizes the signature finely, but a must now be higher 
too so that the probability of error is sufficiently low. This in 
turn degrades the transparency of the watermarked image. 
The choice of the parameters a and p determines the 
trade-off between the transparency and the quality of the 
hidden data. 

For security in copyright protection, we can select special 
regions in the transform domain to embed data, or randomly 
group the coefficients to form a vector using a private key. 
Noise-like pseudo-random sequences can be used for ran- 
dom grouping. It is to be noted, however, that in general, the 
less the quantity of data hidden, the more secure it can be 
made. 

3.2 Extracting Data 3.2.1 Determining the Closest Point 

A watermarked image may be subject to lossy compres- 
sion or other simple image processing operations such as 
enhancement. Under the assumption that the resulting per- 
turbations in the wavelet transform domain can be modeled 
by additive Gaussian noise, a nearest-neighbor search with 
the Euclidean distance measure is needed to recover the 
embedded symbols. FIG. 8 provides a diagram of the 
decoder block 24 from FIG. 6 to show the details of symbol 
recovery and signature extraction. 

Recovering the hidden data starts with the same DWT of 
the received watermarked image that was used to embed the 
data. The true host image coefficients (known to the 
retriever) are then subtracted from the coefficients of the 
received image to obtain the noisy perturbations. Note that 
these perturbations recovered can be "noisy" , because of 
various possible transformations of the watermarked data. 

These coefficients are now grouped into groups of n in the 
same manner as they were grouped during encoding 

(possibly using the private key) to obtain a vector e , and 

then scaled by the factor 1/a. The resulting vector 1/a-T is 
then nearest-neighbor encoded to find the index i of the 
channel code nearest to it in Euclidean distance. In 
particular, we find an index i such that: 

||C(*,M/a7|£|lC(j/) -Uo^llV^lA ■ • - P} (6) 

where the C(s f ) 's refer to the p code-vectors in the channel 
codebook. For lattice based channel codes, this is equivalent 
to finding the lattice point in whose Vbronoi region the 

vector 1/a- e lies. From the index i, the quantized DWT 
coefficient can be obtained. 

To present an example, by means of the diagram in FIG. 
9, let us say that a perturbed vector corresponding to a 
channel code s ( - was received as a noisy vector r, "*". As long 
as it is inside the decision boundary of the original perturbed 
vector s,-, we can receive the data perfectly. However, after 
the general image compression schemes, for example, 
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wavelet-based compression or JPEG coding, or other trans- 
formations like enhancement, if the embedded vector is 
strongly manipulated, to say, noisy vector r*/**", located 
outside of the decision boundary, the symbol detected will 

5 not be the original perturbed value s,. To reduce the inci- 
dence of erroneous detection, the algorithm can expand the 
decision boundary by using a larger scale factor. 

Although the recovered signature image is limited in 
quality by the quantization before embedding, the similarity 

10 measure S defined in Equation (2) can be used to distinguish 
between watermarked and unwatermarked images. Here 
s(m, n) stands for the quantized signature coefficients, and s 
*(m, n) stands for the recovered signature coefficients after 
lossy compression. 

15 3.2.2 Fast Algorithm 

One of the motivations for using lattice based channel 
codes in our implementations is the existence of fast encod- 
ing and decoding algorithms. We present a fast encoding 
algorithm for the D„ lattice that is used to extract the hidden 

20 symbols from the noisy vectors received, if the number of 
channel symbols p is sufficiently large. 
The algorithm for finding the closest point of the lattice to 

an arbitrary scaled noisy perturbation received x=(l/a) e e 
25 is particularly simple. Note that all points of D M are 
included in the n-dimensional cubic integer lattice I". For a 
real scalar number xeSt", let fi(x)=closest integer to x. We 
define f(x) and the function w(x) which assigns the wrong 
direction as follows: 
30 If x=0, then f(x)=0, else w(x)«l 

If 0<m^x^m+V2, then f(x)=m, else w(x)~m+l, 
If 0<m+V$<x<m+l, then f(x)=m+l, else w(x)=m, 
If -m-!4^x§i-m<0, then f(x)=-m, else w(x)=-m-l, 
35 If -m-l^x^-m-Vi, then f(x)=-m-l, else w(x)=-m. 
We can also write x«f(x)+S(x), so that |6(x)|^^ is the 
distance from x to the nearest integer. Then, if x-{x lf x^ . 
. . , xj, vector f(x) is defined by 

40 /w-wxj,^ m ao> (?) 

and g(x) is defined by 

««- fl*s)» ■ ■ ■ , Mfo), .... fan)) (8) 

45 where k is the component with the largest error distance. The 
nearest point to x in the D n lattice structure is chosen as 
whichever of f(x) and g(x) has an even sum of components. 
If x is equidistant from two or more points of the lattice, we 
choose the nearest point as the one having the smallest norm. 

50 EXAMPLE 2 

We used a 256x256 gray scale Lena image as the host, and 
two signature images, a hat-girl image and a tiger image, 
both of which were 128x128 gray scale. A 1-stage discrete 
55 Haar wavelet transform was used for both the encoder and 
the decoder. 

We examined the Lena image digitally watermarked with 
the hat-girl image, at various scale factors a, and various 
quantization levels p, without any compression. Note that 

60 the scale factor a controls the relative weight of host and 
signature image contributions to the fused image. As a 
increases, the quality of the watermarked image degrades. 
For example, we could see artifacts in the background for 
a=20. We found that a=10 appears to be a reasonable value 

65 in terms of the trade-off between quality of the watermarked 
image and robustness to signature recovery under image 
compression. 
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We also examined the signature images recovered from triangular decision boundary shown, s, will be correctly 

the watermarked image after 0%, 65%, 75% and 85% JPEG estimated. Obviously the scale factor a controls the extent of 

compression. In general, most of the recovered signature the regions around each s,. A large scale factor can tolerate 

images were of very high quality for 85% JPEG a large perturbation at the expense of a degradation in the 

compression, when the scale factor a is in the range 10-15. 5 watermarked image quality. 

The quality of the recovered signature with a large scale pr i nc ipal difference for data hiding in color images is 

factor a is obviously much better than those with a smaller mat ^ signa ture images are fused in larger color images 

a. The number of quantizer levels p, on the other hand, usi wavelet ^^fo^ md lattice 

structures. We use the 

determines .the coarseness of quantization and therefore the ^ color for representmg ^Iot. The Y component 

quality of the signature image hidden in the host. 10 is the lummance part of the signaUnd U and V represent the 

FIG. 10 shows the similarity between the original and the chrominance components. Adopting the YUV color space 

recovered signature, when the hat-girl image is embedded faciUtates a ^ k extensioQ £ 0 £\ to di iu , ^ 

into the Ixna image^ Note that good authentication is sucfa ^ those m me Mp£G format ^ v y m m * Qmn]& are 

possible for up to 85% JPEG lossy compression. down-sampled by a factor of two. In this method, the host 

As can be seen from the foregoing, the invention provides 15 and signature images are first wave let transformed used the 

for highly effective data embedding using the D 4 lattice in discrete Haar wavelet transform. The wavelet coefficients 

the DWT domain. The method presented provides a frame- are lhen encode d using channel codes derived from a finite 

work for a more structured digital watermarking scheme, subset of the lattice structure> which consists of all imeger 

aimed at embedding large amounts of data into a host. The N-tuples with constraints. As the quantity of embedded data 

quality of the recovered signature under significant image 20 ^creases, higher order shells of the lattice structure are 

transformations can be improved by using higher dimen- included in the channel code to accommodate them, 
sional lattice structures like the E 8 or the A 16 lattice. Further, 

by proper indexing of the scalar codebook used for the EXAMPLE 3 

wavelet coefficients of the signature image, the recovered Color images were represented in the YUV color space, 

signature quality can be substantially improved for the same 25 We used a 256x256 color hose image and a 128x128 gray 

scale factor of embedding and for the same number of levels sca i e signature image. The signature was injected into the Y 

for quantization. More sophisticated schemes for error component of the transform coefficients of the host image, 

resilience, such as trellis-coded modulation, can also be From observing an 81% JPEG compressed watermarked 

use d. image using 32 channel codes and the same compressed 

4. Color Image Embedding Using Multidimensional Lattice 30 un age us i ng 144 channel codes, we found that there were no 

Structures visible distortions in the watermarked images. Additionally, 

It is known that the human visual system is not very f rom observing the recovered signatures for the two quan- 
sensitive to changes in the higher frequency spectrum, and tization levels, we found the reconstructed images to be of 
as such many of the lossy compression techniques rely on very good quality for authentication purposes, 
saving bits needed to represent the information in these 35 We also examined an example of a color signature embed- 
higher frequencies. For this reason it is important that the ding ^ entire si gnature data was embedded in the Y 
signature data be embedded in the lower frequency compo- component of the host data in order not to distort the color 
nents of the host data. m me watermarked image. For this reason, the size of the 

The schematic 30 in FIG. 11 shows our color image signature image was i ess tnan tha t for a gray scale embed- 

embedding procedure. The basic hiding/extracting scheme is ^ ding We found our image embedding method to be robust, 

similar to the our previous data hiding/extracting technique and concluded that it could be easily extended to video 

using the multidimensional lattice structures described watermarking as well. 

above and shown in FIG. 7. A single level of discrete r j n „, cir \ A „. „, „ P 

, t t - ✓™ ir r\ * u 4k *u u * a *u HG. 13 and FIG. 14 show the similarity of the recon- 

wavelet transformation (DWT) of both the host and the slmctcd . {Q {hQ si . fof varioug 

signature image is made before data embedding Each 45 i eV els of JPEG compression. A normalized similarity func- 

coefficient of the signature image is quantized into p levels. C /v • , * 

1 j . u * a - ** f * ton S(s) is defined as 
In order to embed the quantized coefficient information, a set 

of N coefficients in the host image is grouped to form an 

N-dimensional vector, and the vector is then perturbed s{s) = ^ s * * 

according to a p-ary channel code consisting of a subset of 50 

the lattice scaled by a factor a. If v represents a vector of 

host DWT coefficients after grouping, and the index of the where s is the signature image components organized as a 

quantized signature coefficient is i, then the perturbed vector vector, and S is the reconstructed signature vector. As can be 

is given by Equation (5). seen from the graphs, the watermarked image can be easily 

In signature recovery, the watermarked DWT coefficients 55 authenticated even at 85% lossy JPEG compression. FIG. 14 

are grouped based on the p-ary channel code used in shows Peak Signal to Noise Ratio (PSNR) of the recon- 

...... , -* r«. . . structed image as a function of JPEG compression factor. 

encoding to obtain a new vector e . This is then scaled by r- DCKm . „ . , . lL 4t 4 , .... 

t _ J? & ., , . JjCj .^ • /fV ™ T" e PSNR is computed with respect to the original signature 

the factor 1/a where is as denned in Equation (5). The w~p™ m * a *l * a i-# „ . 

. it _ ... - - — - - before quantization. We noted that good quality reconstruc- 

resultant vector is then nearest-neighbor encoded to find the „ t - - u , , . . , - ctv TO ™ c 

. . . - . , , °\ , , 4 . i( _ „ -.j 60 tion was possible up to about 75% JPEG compression for 

index 1 of the channel code nearest to it in the Euclidean a 15 

distance. In particular, we find an index i sucb that Equation s " ^ Hiding and Reconstruction Host Image 

' o ° -i S me / ^ . * . . .a „ T „ . Thus far, we have discussed image reconstruction where 

Similar to before, this is illustrated m FIG. 12. Assume Wt imano • . u o™ „,u M t u a u^* ■ „ 

. ^ . , , r ■ me host unage is available. However, when the host image 

that the symbol s,. was sent but because of compression or 6J fa unavailabl addjtional ^^ites are involved. A sche- 

some other .mage processing operation, the observed vector matjc ^ ouf ^ embeddi ^ method for ^^0,, 

"*" (equal to 1/a-e) is obtained. If "*" is within the without the host image is shown in FIG. 15. A key compo- 
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nent of this method is embedding using multidimensional 
lattices as previously described. Signature and host images 
are transformed using the block Discrete Cosine Transform 
(DCI). The block size chosen is 8x8 pixels. The signature 
coefficients are quantized in two steps. First, by using the 
standard JPEG quantization matrix, and then by a user 
specified signature quantization matrix. The signature quan- 
tization matrix determines the relative size of signature data 
compare to the host data, thus controlling the quantity and 
quality of the embedded data as described in Section 5.1. 
These quantized signature coefficients are then encoded 
using the multidimensional lattices and inserted into the host 
DCT coefficients. This insertion is adaptive to the local 
texture content of the host image blocks and controlled by 
the block texture factor as described in Section 5.2. The 
steps in embedding are summarized in Section 5.3. 

5.1 Signature Image Quantization 

There is clearly a trade-off between data embedding 
quantity and quality of reconstruction. We method discussed 
below provides a simple scheme here for quantizing signa- 
ture image data using the block DCT quantization matrix. 
This approach enables robust recovery of signature data 
when the embedded image is subject to JPEG compression. 

Consider an 8x8 DCT coefficient matrix. From image 
compression and information theory, it is well known that 
low frequency coefficients require more bits than the high 
frequency ones. One such quantization matrix indicating the 
number of quantization levels for each of the sixty-four 
coefficients is shown in FIG. 16. These quantized coeffi- 
cients are embedded in a lattice structure as described in the 
previous section. For simplicity, we will consider only those 
shells in the lattice structure whose elements are {±1, 0}. 
One way of distributing these coefficients is as follows: 

5.1.1 Quantization Level~1232. Use Lattice type E 8 : The 
first and second shells of lattice combined have 2400 
code words; however, here we use 1232 code words 
from the combination of first shell and part of second 
shell in this lattice. Since an E 8 code has eight 
components, it requires 8 host coefficients to embed 
one E 8 code. There are 3 coefficients with this 
quantization, requiring 24 host coefficients to embed. 

5.1.2 Quantization Level=342. Use Lattice type E e : The 
first and second shells of E 6 contain 342 code words. 
Six host coefficients are needed to embed an E 6 code. 
The six coefficients in the DCT matrix thus need 36 
host image coefficients to embed. 

5.1.3 Quantization Level=48. Use Lattice type D 4 : The 
first two shells of D 4 are used to encode 48 levels. Each 
code requires four host coefficients. There are thirteen 
coefficients with this quantization, thus requiring 52 
host coefficients. 

Thus, method outlined above thus needs a total of 112 host 
coefficients to embed the 64 DCT coefficients from the 
signature image. 

The next step in embedding is to identify the host coef- 
ficients which are affected by the data embedding procedure. 
The low frequency components contain most of the host 
signal energy but they can not be easily modified as such 
changes may become visible. The high frequency 
components, which usually pack the least amount of energy, 
could be easily removed because of signal processing opera- 
tions. This leaves us with the mid frequency components. 

Consider an 8x8 block of host image coefficients, as 
shown in FIG. 17. The shaded regions indicate the frequency 
components that are identified for encoding the signature 
image data. In this example, 28 host coefficients are used in 
each block, thus requiring four host DCT blocks to encode 
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one signature block. It will be appreciated that four host 
DCT blocks (4x28=112) are needed to embed one 8x8 
signature DCT block. 

Another example of signature image quantization and the 
corresponding host coefficient allocation are shown in FIG. 
18 and FIG. 19. Note that 192 host coefficients are needed 
for this case (6x for Eg, 16x for E 6 , and 12x for D 4 
=6x8+16x6+12x4=192). One possible way of distributing 
this is shown in FIG. 19 where 12 host coefficients are 
identified for insertion. This requires a total of 16 host DCT 
blocks per signature block. 

5.2 Texture Masking 

The signature coefficients are adaptively embedded into 
the host image coefficients. Recall that insertions into host 
image regions with low texture information would result in 
visible distortions. The texture block factor y controls the 
weighting of the signature coefficients for each 8x8 DCT 
host image block. We use a normalized measure of texture 
energy, defined as: 
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where /^B) is the average energy in band B (B={LH, HL, 
HH}) after a one level discrete wavelet decomposition of the 
host image ju D (B) and is the average energy in band of a 
given 8x8 host image block. The term ^B) characterizes 
the given block texture energy for a given band B. A Haar 
wavelet transform was used in our experiments. If yU^B) 
exceeds a given threshold, say T^B), then the correspond- 
ing block is considered to have significant texture in band B. 
If the block texture energy exceeds the threshold for two out 
of three bands, then the block is considered to be highly 
textured. Similarly, if two out of three band energies fall 
below the threshold T A (B), then the corresponding block is 
considered to be low in texture. 

Each host image DCT block is thus classified into one of 
highly textured, normal, or low textured block, and the 
texture block factor is appropriately set. In the example 
discussed below the following parameter values are used: 

T w (F)-4/3, V5; T L {B)^A; y(high)-2; Y(normal)-0; y(low)— 2 

5.3 Data Embedding 

We can now summarize the various steps in the embed- 
ding procedure. FIG. 20 provides a schematic of the encoder 
block 42 of FIG. 15 to show the encoding steps. 

5.3.1 The host and signature images are transformed to the 
DCT domain. A block size of 8x8 is used in the 
example given below. 

5.3.2 Each block of 8x8 host image pixels is analyzed for 
its texture content and the corresponding texture block 
factor y is computed. 

5.3.3 The signature coefficients are quantized according to 
the signature quantization matrix and the resulting 
quantized coefficients are encoded using lattice codes. 
The lattice codes are so chosen that the code vectors 
contain only ±1 or zeros. 

5.4 The signature codes are then appropriately scaled 
using the total scale factor 6=a+y and the commonly 
used JPEG quantization matrix. The JPEG quantization 
matrix helps in renormalizing the code vectors so that 
they have a similar dynamic range as a typical DCT 
block. Note that 8^0, which in turn constraints the 
choice of a arid y. 

5.5 The selected host coefficients are then replaced by the 
scaled signature codes and combined with the original 
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(unaltered) DCT coefficients to form a fused block of in a deterministic fashion before distribution. As a result of 

DCT coefficients. Note than more than one host coef- embedding, a mean-squared-error MSE^ is introduced into 

ficient is needed to encode a single signature code. the embedded host. To ensure transparency of embedding, 

5.6 The fused coefficients are then inverse transformed to £e ™ S F» value sh ? uld * «g™ d ^ Kd 1CV n' 

give an embedded image. As discussed earlier, the s While in watermarking the allowable ; MSE„ is very small, 

choice of signature quantization matrix affects the ! nd "°. 18 the amo ™! ^ dat , a - «? da,a , hldl °8> th ° 

j p., , j, j , t . c focus is more on hiding larger amounts of signature data at 

quantity and quality of the embedded data. Choice of ^ ^ of a ^ MS£ distribmio 

the scale parameter a depends on the apphcation. A ^ ho ^ ically u ^ ergoes compression^ other standard 

larger value a for results m a more robust embedding transformations. The extraction process may or may not, 

at the cost of quality of the embedded image, i.e., there ™ depending on the nature of the apphcation, require knowl- 

could be perceivable distortions in the embedded edge of the original host, to estimate the hidden signature 

image. A smaller a may result in poor quality recovered from the "noisy" embedded host that is received. After 

signature when there is a significant compression of the extraction, it is desired that the channel mean-squared-error 

embedded image. MSEs between the original signature and the extracted 

15 signature be as low as possible. 

EXAMPLE 4 From the discussion of data hiding techniques so far, it 

We used two different sizes for the host image. For wil1 ^ appreciated that the above dual problems of data 

embedding using the signature quantization matrix of FIG. hiding and watermarking, readily map to the source and 

16 and FIG. 17, a 256x256 host image was used, resulting channel coding problem in digital communications. As such, 

in 25% data embedding. A 512x512 host image was used 20 established concepts from digital communications could be 

with the quantization matrix of FIG. 18 and FIG. 19. '"f* * >^ this problem^ 

^ . 6.1 Data Hiding using Vector Perturbations 

We examined the embedded images with and without According to the present invention, the host data is 

texture masking. The signature quantization matrix shown m orthogona u y transformed before embedding the hidden sig- 

FIG. 18 and FIG. 19 was used for this purpose. We found 1$ naturc in h ^ transfonn h not essential because a raw 

that texture masking reduces visible distortions in regions image Qr y[dcQ is by itself aQ expansion an the standard 

that are flat. bases. However, it may lead to some advantages. Let us 

We also examined recovered host and signature images consider a host data source (X lP X^ . . . , X„) transformed 

for two different quantizations of the signature data, using orthogonally to a set of N coefficients (C l9 C 2 , C^). The 

texture masking. In this case, the embedded images were 30 transform-domain embedding process perturbs the coeffi- 

lossy compressed by JPEG to 89%. Obviously, the quanti- cieats into a new set of coefficients given by C 2 , . . . , 

zation matrix of FIG. 18 and FIG. 19 yields better results f^). The inverse transformation then yields the embedded 

than the one shown in FIG. 16 and FIG. 17 at the cost of host (X lf 3^, . . . Since the transformation is 

more host bits per signature coefficient. orthogonal, the mean-squared-error introduced in the coef- 

Finally, FIG. 21 and FIG. 22 show the quality of the 35 ficients is exactly equal to the mean-squared-error intro- 

embedded and recovered images using the PSNR as a duced in the host data. That is, 
measure. It is clear from these graphs that one can achieve 

better quality embedding using the quantization matrix of t n 2 1 * 

FIG. 18 and FIG. 19 at the cost of lower bit rate for the MSE h = - -^Xt-x^ = - -c,| 

hidden data. We found that even at 25% embedding, one can 40 1=1 1=1 
recover visually acceptable quality results for up to 90% 

lossy compression using JPEG. Now ^ a transparency constraint is imposed on the value of 

It will be seen, therefore, that the invention provides a MSE H . This specifies a maximum value P which upper 

robust data hiding technique for embedding images in bounds M$E„ for a given application: 

images. A key component of the scheme is the use of 45 

multidimensional lattice codes for encoding signature image j n j n 02) 

coefficients before inserting them into the host image DCT - • ]>] |*j ; - xtf < p => - • |c ; - c ; | 2 < p 

coefficients. Texture masking is used to reduce distortions in ,=1 ,=1 
the embedded image by adaptively controlling the weights 

associated with the hidden data. The hidden signature data 50 ^ smaller the vahie of p> the more ^0^^ the embed . 

can be recovered in the absence of the original host image. dm g ^ ^ vice-versa. 

Experimental results show that this method is robust to lossy since N ^ typically very large for images and video, it 

image compression using JPEG. One can trade-off quantity makes sense l0 simplif y me transparency constraint by 

for quality of the embedded image by choosing appropriate g roup ing the N coefficients into k-dimensional vectors with 

signature quantization matrices. 55 k<<N> and satisfying the constraint in each of the vectors 

6. Hiding Speech in Video individually. Further, it may be necessary to perturb only a 

In order to hide speech in video in accordance with the hmited numbef M of the N ^fideiite* say the coefficients 

present invention, the host video is wavelet transformed m only one particular band of a su bband or wavelet decom- 

frame by frame, and vectors of coefficients are perturbed vo&iiioxit Th at is, if the M coefficients to be perturbed are 

using lattice channel codes to represent hidden vector quan- 60 grouped int0 M/k vectors of dimension k> denoted as vj-i, 

tized speech. The embedded video is subjected to H.263 2 , . . . , M/k, and the corresponding perturbed vectors are 

compression before retrieving the hidden speech from it. denoted ^ v then for each of the vectors, the following 

The retrieved speech is intelligible even with large com- must be t0 satisfy the constraint in Equation (12): 
pression ratios of the host video. 

FIG. 23 presents a basic schematic 50 of the data hiding 65 i/*-||v r ^| 2 <p c -jv/Af i>y-i.2, . . . , M/k (13) 

and watermarking problem as it applies to hiding speech in At this stage we can explain the general embedding 

video. The original host is modified using the signature data principle by means of the diagram in FIG. 24. The signature 
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data is first coded, either losslessly or lossily, to generate a 
sequence of symbols from a Q-ary alphabet {s lf . . . ,s c }. 
The embedding process injects one symbol in each coeffi- 
cient vector Vy , by perturbing it in one of Q possible ways 
in k-dimensional space to obtain the perturbed vector V ; . 5 
Note that the possible values of V, all lie within a shell of 
radius vlcF^ from V ; ., to satisfy the transparency constraint. 
The possible perturbations constitute what is in general 
known as the channel codebook, of size Q and dimension k. 
The channel codebook is usually obtained from a noise- 10 
resilient channel code by scaling it by a factor a which 
determines the transparency constraint. That is, the per- 
turbed vectors are obtained as: 

tr r VjwC(sb (14) 15 

where the set of vectors C(s^), i=l, 2, . . . , Q constitute a 
channel shape codebook of size Q. The perturbed coeffi- 
cients are used to inverse transform the host before trans- 
mission or distribution. 

The extraction principle is outlined in FIG. 25. Let us say 20 
that the jth distributed perturbed vector V ; , corresponding to 
a symbol s„ has been received as W /? as a result of an 
additive noise n^ due to compression and other transforma- 
tions. However, as long as the received vector does not go 
beyond certain pre-determined decision boundaries for sym- 25 
bol Sf, the correct transmitted symbol s, will still be 
extracted, provided the true original host is known. The 
recovery process thus extracts from each vector the symbol 
within whose decision boundaries the received vector lies. 
In other words, a nearest neighbor search with an appropri- 30 
ate distance measure is used. The decision boundaries 
depend on the statistical model chosen for the additive noise. 
The sequence of extracted symbols are then decoded to 
obtain the extracted signature. 

Some comments are now in order. First, we can define a 35 
rate R for data injection in bits/dimension as follows: 

log 2 0 (15) 

Next, assuming an i.i.d. additive white Gaussian noise ^ 
(AWGN) model for the pixels in the distributed host, and 
therefore its orthogonal transform coefficients, the extraction 
process becomes a simple nearest-neighbor encoder with the 
Euclidean distance measure and symmetric-hyperplane 
decision boundaries. Moreover, if we assume the AWGN ^ 
variance to be a 2 , we can define a channel capacity, C as: 

C= 2 l0 ^( i+ ^) bitS - 

SO 

Thus, P c , obtained by scaling the transparency constraint P 
by a factor (N/M), can be viewed as the power constraint on 
the channel. According to Shannon's celebrated theorem, as 
long as 

R <C, virtually error-free transmission can be achieved by 55 
choosing a sufficiently large dimension k. The term C is the 
theoretical upper-bound on the error-free rate a AWGN 
channel can sustain for a given power constraint. 
Unfortunately, the upper-bound can only be achieved for 
infinite dimensionality k. In practice, the larger the dimen- 60 
sion k, the more noise resilient the channel coder is. 
Therefore, the dimensionality of the vectors should be 
increased as much as possible. 

Finally, with increase in the amount of signature data, it 
makes sense to lossily source code the data if it is com- 65 
pressible, A method that works well for correlated sources is 
vector quantization. The indices obtained by vector quanti- 
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zation are embedded into the host transform coefficients by 
vector perturbations derived from noise -resilient channel 
codes. Note that it is also possible to design channel- 
optimized VQs (COVQ), or Power-Constrained COVQs 
(PCCOVQ) for belter noise performance. 

In the present invention, the channel codes are chosen as 
subsets of lattices in multiple dimensions. It is known that 
the lattices D 4 , E 6 , E 8 , K 12 , A 16 , etc. produce very good 
channel codes in their respective dimensions, and tables and 
graphs with their nominal coding gain results are commonly 
available. 

Most of our implementations are based on spherical or 
constant-energy codes, for which, all the points are equidis- 
tant from the origin. With such codes, the MSE^ introduced 
as a result of embedding is exactly equal to the transparency 
constraint. In practice however, for image and video hosts, 
the effect of rounding the pixels of the embedded host to 
integers, and limiting them to lie in the range of 0-255, may 
cause minor deviations from the theoretical value. 
6.2 Recovery from Video Host without Original 
The general principle of data hiding in video is as follows. 
Each frame of a video sequence is orthogonal wavelet 
transformed, and the transform coefficients are grouped into 
vectors. The signature data is vector quantized, and the 
indices are embedded into the coefficient vectors in one or 
more subbands using efficient channel codes. The same 
hidden data may be repeated in a few successive frames to 
introduce robustness to low frame rate compression of 
video. Note that the frame by frame approach fits very well 
with the frame -based compression technology currently in 
vogue. 

We now focus on the issue of choice of subband for 
embedding the hidden data. When the original host is 
available during retrieval, and the kind of host transforma- 
tion we are most concerned with is compression, hiding data 
in the lower subbands has several distinct advantages. Most 
modem compression schemes quantize the lower bands 
finely, and in some way exploit the fact that the higher bands 
have very little energy. Injecting extraneous information 
only in the lower bands, and leaving the higher bands 
untouched, therefore, reduces the probability of destruction 
of the hidden information, and at the same time does not 
affect any significant change in the coding efficiency. 
Although a disadvantage is that the distortions introduced by 
embedding may be perceptually more severe, weighing the 
pros and the cons, hiding data in the lower subbands is still 
found to be better. 

If, however, extraction is to be made possible without 
knowledge of the original host, hiding data in the lower 
bands is not appropriate. The key idea behind a data hiding 
scheme that allows extraction without the original host, is to 
convert the original host conveniently before embedding to 
a slightly different one, and to use that as the base host for 
embedding, instead of the true original. The modification 
introduced must be such that it becomes possible to estimate 
the base perturbed vectors from the received host, with the 
modified base host being only trivially dissimilar to the true 
original. Natural images typically have very low energy in 
the high-high bands. Therefore, a simple zeroing out of one 
or more of the high-high bands, introduces a very low MSE, 
and for most images, affects image detail only inconspicu- 
ously in the perceptual sense. If a modified base host is 
obtained by zeroing out one or more of the subbands of the 
original host, the extraction process only needs to use the 
zero-vector as the estimation base for the perturbed vectors 
it receives within these subbands. This however, contradicts 
the requirements in the previous paragraph, that it is better 
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to embed data id the lower subbands. To make a 
compromise, the following methodology is adopted. As 
shown in FIG. 26, a two-stage wavelet decomposition of 
each frame is made and the data is hidden in the shaded 
LL-HH subband after zeroing. 

It is appropriate to make a comment on the zeroing out 
approach described above. Zeroing out one or more bands 
before embedding may result in significant distortions or 
loss of detail for some host videos, A greater transparency of 
embedding may be achieved if the coefficients in the con- 
cerned subbands in the base host are predicted, linearly or 
non-linearly, from the coefficients in the other subbands that 
are not zeroed out. Specifically, if the prediction used is 
linear, and the noise is assumed to be additive i.i.d. 
Gaussian, it can be shown that the noise in the predicted base 
coefficients will still be Gaussian. The estimation of the 
transmitted symbols will then be essentially the same prob- 
lem as before, but at a higher noise level. In general 
however, linear prediction across subbands does not lead to 
any significant advantages. Obtaining the best nonlinear 
prediction across subbands, on the other hand, is a very 
difficult problem. Further, this leads to the difficult problem 
of estimation of the base coefficients in the embedded 
subbands, from the already noisy coefficients in the other 
subbands, at the retrieval end. In this case, the predicted base 
coefficients will no longer be Gaussian, and consequently, 
the decision boundaries for extraction may be very complex. 
In this work, we have sidetracked the issues involved by 
adopting a simple zeroing out approach, which works very 
well in practice. 

FIG. 27 and FIG. 28 show schematic diagrams 60, 70 for 
the embedding and extraction mechanism outlined above, 
respectively. The host video is first wavelet transformed. An 
encryption key is used to pseudo-randomly shuffle the 
coefficients in the subband chosen for embedding before 
grouping them into k-dimensional vectors. The hidden com- 
pressible data is appropriately vector quantized, and the 
indices obtained in the process are embedded into the 
k-dimensional host transform vectors by vector perturba- 
tions in accordance with efficient channel codes scaled by a 
factor a. The encryption key based shuffling introduces an 
additional layer of security apart from the security enforced 
by the already astronomic variability in the source and 
channel codebooks chosen. It is virtually impossible for 
unauthorized persons who know the algorithm, to pirate the 
hidden information, without knowledge of the source 
codebook, the channel codebook, or the encryption key. 

Another advantage of using pseudo-random shuffling of 
coefficients to form vectors is as follows. Typically, the noise 
introduced as a result of transformations such as compres- 
sion in a frame occur in "bursts". That is, a heavily corrupted 
coefficient is likely to have its neighboring coefficients also 
heavily corrupted. Therefore, if adjacent coefficients are 
grouped to form vectors, the noise in the components remain 
too correlated to fit our assumed model of being independent 
and identically distributed. Shuffling implies that the com- 
ponents of a vector now come from different random parts 
of a frame, and therefore, the noise introduced in the 
coefficients become closer to being i.i.d. This in turn vali- 
dates the use of the Euclidean distance measure for channel 
decoding. 

EXAMPLES 

We implemented a system for hiding 8 kHz sampled 
speech at 16 bits/sample in a 30 frames/s QCIF video, 
without requiring the availability of the original video for 
retrieval. The speech and video were synchronized in time. 
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Successive samples of speech were vector quantized, and the 
indices were embedded into the LL^HH subband coefficients 
of the video on a frame -by-frame basis. Temporal redun- 
dancy was incorporated by embedding the same information 
5 in several successive frames, so that the embedding becomes 
robust to frame skips during compression. 

First, we attempted embedding the signature speech in 
only the luminance LL-HH subband. The embedded video 

10 was piped through a H.263 encoder as before, and the 
reconstructed video is used to extract the hidden speech 
segment. We present the details of three different implemen- 
tations with increasing dimensionality of the channel codes 

15 used: 

(a) The speech is vector quantized with a codebook of size 
576 and dimension 4. The index obtained was decom- 
posed into two 24-ary symbols, each of which was 

2Q embedded into a vector of dimension 4 obtained by 
grouping four luminance LL-HH coefficients of a two- 
stage wavelet decomposition. The embedding was done 
by perturbing the vectors in accordance with a spherical 
channel code consisting of the first shell of the D 4 

25 lattice (which has 24 points). 

(b) The speech codebook is of size 240 and dimension 4. 
The index for each speech vector was used to perturb 
a group of 8 luminance LL-HH coefficients in accor- 

30 dance with a spherical channel code comprising the 240 
points on the first shell of the E s lattice. 

(c) The speech codebook is of size 4320 and dimension 8, 
The encoded index was embedded into a vector of size 
16 obtained by grouping 16 luminance LL-HH coeffi- 

35 cients. The channel code comprised the 4320 points on 
the first shell of the Barnes- Wall Lattice A 16 . 
For all the above implementations, the same information 
was repeated in two successive frames to introduce robust- 
ness to low frame rate compression. The News QCIF video 

40 was used as the host for hiding a segment of male speech. 
The signal to noise ratio for the extracted speech segment 
against the video bit rate after H.263 compression of the host 
at 15 frames/s (frameskip=l) is plotted in FIG. 29. The 

45 transparency constraint was the same for all these results. As 
expected, the highest dimensional lattice A 16 was found to 
be most robust to noise. 

We next present the results for three implementations 
where both the luminance and the chrominance coefficients 

50 are perturbed: 

(a) The speech codebook is of size 5184 and dimension 8. 
Each index was decomposed into two 72-ary symbols, 
which are embedded into two coefficient vectors of 
dimension 6. Each 6-dimensional coefficient vector 

55 was obtained by grouping 4 luminance LL-HH coeffi- 
cients and 1 LL-HH coefficient from each chrominance 
component. A spherical channel code derived from the 
first shell of the E 6 lattice (which also has 72 points) 

6q was used for each symbol. 

(b) The speech is vector quantized with a codebook of size 
756 and dimension 8. A 12 -dimensional coefficient 
vector was obtained by grouping 8 luminance LL-HH 
coefficients and 2 LL-HH coefficients from each 

65 chrominance component. A spherical channel code 
consisting of the 756 points on the first shell of the 
Coxeter-Todd lattice K J2 was used. 
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(c) The speech is vector quantized with a codebook of size 
4096 and dimension 16. A 24-dimensional coefficient 
vector was obtained by grouping 16 luminance LL-HH 
coefficients and 4 LL-HH coefficients from each 
chrominance component. A spherical channel code 
G 24 , consisting of 4096 points, was used. G 24 was 
obtained from the (24, 12) extended Golay code by 
converting zeroes to ones, and ones to negative ones. 
For all the above implementations, the same information 
was repeated in four successive frames, FIG. 30 presents the 
retrieval SNR vs. bit rate results for the above methods when 
a segment of female speech was hidden in a Grandmother 
QCIF video, which was then coded by H.263 at 7.5 frames/s 
(firameskip=3). The transparency constraint was the same for 
all these results. As expected, the highest dimensional lattice 
G 24 was found to be most robust to noise. 

As can be seen therefore, the foregoing provides a generic 
framework for hiding compressible data in host video. Our 
MSE-optimal quantitative treatment is motivated by the 
identification of the similarity of the data biding problem 
with the source and channel coding problem in digital 
communications. While the generic approach can be used 
successfully for the case when the original host is available 
to the retriever, the true potential of data hiding lies in being 
able to extract the hidden data without using the original 
host. The above method is readily adapted to allow this, 
making possible invisible mixing of different kinds of hid- 
den data, with standard forms of open data transmission. 

Although the description above contains many 
specificities, these should not be construed as limiting the 
scope of the invention but as merely providing illustrations 
of some of the presently preferred embodiments of this 
invention. Thus the scope of this invention should be deter- 
mined by the appended claims and their legal equivalents. 

TABLE 1 



Cod e_ Types and structure of the D4 lattices 



Shell No. Squared Norm Source codes Number of codes 



1 

2 

3 
4 
5 


2 
4 

6 
8 
10 


(±1, *1, 0, 0) p 24 
(±2, 0, 0, 0) p , 24 
(±1, *1, *1, *l) p 
(±2, *1, si, 0) p 96 
(±2, ±2, 0, 0) p 24 
(±2, *2, *1, *l) p , 144 
(*3, tl, 0, 0) p 


TABLE 2 




Quantizer Level (D,* lattice) 




Quantizer Levels p* 


Lattice points in chancel code 




2 
24 
32 


(0, 0, 1, 1), (0, 0, -1, -1) 
Shel^ 

Shellp (±2, 0, 0, 0) p 
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TABLE 2-continued 



5 



Quantizer Level (D A lattice) 


Quantizer Levels p 


Lattice points in channel code 


48 


Shell!, ShcU 2 


144 


Shell „ Shell 2 Shell 3 


168 


Shelly Shell 2 Shell 3 , Shell 4 



10 

What is claimed is: 

1. A method for embedding a signature image in a host 
image, comprising: 



15 ( a ) performing a single level discrete wavelet transform 
decomposition of said signature image and said host 
image; 

(b) quantizing into (5 levels each coeflkient of said 
signature image by grouping a set of n coefficients in 

20 the host image to form an n-dimensional vector, and 
perturbing said vector according to a p-ary channel 
code comprising a subset of an n-dimensional lattice 
scaled by a factor a; 

2S (c) embedding each subband of said signature image into 
a corresponding subband of said host image to produce 
a composite image; 

(d) subtracting the coefficients of said host image from the 
coefficients of the composite image to obtain noisy 

30 perturbations; 

(e) grouping the resulting coefficients into groups of n to 

obtain a vector e ; 

(f) scaling said vector e by 1/a to produce a resulting 
35 vector l/a*~e*; 

(g) nearest- neighbor encoding 1/a* e to find an index i of 
the channel code nearest to it in Euclidean distance; 

(h) obtaining a quantized discrete wavelet transform coef- 
40 ficients from said index i. 

2. A method for embedding an audio signature in a host 
video image, comprising: 

(a) encoding said audio signature to generate a sequence 
45 of symbols from a Q-ary alphabet {s 1( Sj, . . . , s c }; 

(b) injecting one symbol in each coefficient vector V y , by 
perturbing it in at least one of Q possible ways in 
k-dimensional space to obtain the perturbed vector V,.; 
and 

50 (c) using perturbed coefficients to inverse transform said 
host video image and produce a composite signal. 

3. A method as recited in claim 2, further comprising 
extracting from each perturbed vector the symbol within 
whose decision boundaries the vector of the composite 

55 signal lies. 

***** 
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